PSCI 3300.003 Political Science Research Methods
A. Jordan Nafa
University of North Texas
October 18th, 2022
Probability and Statistics Review
Basic Concepts
Random Variables
Sampling Distributions
Properties of Estimators
Theory assignment due Friday October 21st
\[ \definecolor{treat}{RGB}{27,208,213} \definecolor{outcome}{RGB}{98,252,107} \definecolor{baseconf}{RGB}{244,199,58} \definecolor{covariates}{RGB}{178,26,1} \definecolor{index}{RGB}{37,236,167} \definecolor{timeid}{RGB}{244,101,22} \definecolor{mu}{RGB}{71,119,239} \definecolor{sigma}{RGB}{219,58,7} \newcommand{normalcolor}{\color{white}} \newcommand{treat}[1]{\color{treat} #1 \normalcolor} \newcommand{resp}[1]{\color{outcome} #1 \normalcolor} \newcommand{sample}[1]{\color{baseconf} #1 \normalcolor} \newcommand{covar}[1]{\color{covariates} #1 \normalcolor} \newcommand{obs}[1]{\color{index} #1 \normalcolor} \newcommand{tim}[1]{\color{timeid} #1 \normalcolor} \newcommand{mean}[1]{\color{mu} #1 \normalcolor} \newcommand{vari}[1]{\color{sigma} #1 \normalcolor} \]
Implies the long-run relative frequency of some event \(\resp{A}\)
\(\Pr(\resp{A})\) is a real-valued function defined on a sample space
Probabilities have the following important properties
\(0 \le \Pr(\resp{A}_{\obs{i}}) \le 1 \quad \forall \quad \resp{A}_{\obs{i}}\)
\(\Pr(\resp{A}_{\obs{i}} + \dots + \resp{A}_{\sample{n}}) = 1\)
\(\Pr(\resp{A}_{\obs{i}} + \dots + \resp{A}_{\sample{n}}) = \Pr(\resp{A}_{\obs{i}}) + \dots + \Pr(\resp{A}_{\sample{n}})\)
Implies that events are exhaustive (2) and mutually exclusive (3)
Random Variable
A variable that may take a range of values as defined by some stochastic process
It may be observed as either discrete or continuous
It has a Probability Density Function (PDF) that assigns probabilities to outcomes
A particular value that a random variable takes is called its Realization
First Moment (Mean)
Second Moment (Variance)
Third Moment (Skewness)
Fourth Moment (Kurtosis)
Probability Mass Function (PMF): The process by which the probability \(\pi\) is assigned to a given value \(i\) for a discrete random variable.
\(\pi(\treat{x}_{\obs{i}}) = \Pr(\treat{X}=\treat{x}_{\obs{i}}) \quad \forall \quad \obs{i} \in \{0,1, \dots,\sample{n}\}\)
Where \(\quad 0\le \pi(\treat{x}_{\obs{i}}) \ge 1 \quad\) and \(\quad \sum \pi(\treat{x}_{\obs{i}}) = 1\)
In other words, the PMF is the function that yields the probability \(\pi\) that the realization of the random variable \(\treat{X}\) is equal to some observed discrete value \(\treat{x}\).
Probability Mass Function (PMF): The process by which the probability \(\pi\) is assigned to a given value \(i\) for a discrete random variable.
\(\pi(\treat{x}_{\obs{i}}) = \Pr(\treat{X}=\treat{x}_{\obs{i}}) \quad \forall \quad \obs{i} \in \{0,1, \dots,\sample{n}\}\)
Where \(\quad 0\le \pi(\treat{x}_{\obs{i}}) \ge 1 \quad\) and \(\quad \sum \pi(\treat{x}_{\obs{i}}) = 1\)
In other words, the PMF is the function that yields the probability \(\pi\) that the realization of the random variable \(\treat{X}\) is equal to some observed discrete value \(\treat{x}\).
# Calculate the relative frequencies
dice_probs <- dice %>%
group_by(dice_roll) %>%
summarise(totals = n()) %>%
mutate(prob = totals/sum(totals))
# Plot the PMF for each outcome of the dice
ggplot(dice_probs, aes(x = dice_roll, y = prob, fill = as_factor(dice_roll))) +
# Add a barplot geom to ggplot object
geom_bar(stat = "identity", show.legend = F, col = "black") +
# Adjust the paramters of the x axis
scale_x_continuous(breaks = seq(2, 12, 1)) +
# Adjust the paramters of the y axis
scale_y_continuous(
breaks = seq(0, 0.18, 0.03),
limits = c(0, 0.18),
expand = c(0, 0)
) +
# Add labels to the plot
labs(
title = "Probability Mass Function",
x = "Value of Dice Roll",
y = "Probability"
)Continuous Random Variables are continuous and unbounded, taking on an infinite number of possible values.
The probability of observing any specific realization of \(\treat{X}\) is effectively zero.
We instead focus on the probability that \(\treat{X}\) falls within a given range of values such that \(\Pr(\treat{X} \ge \treat{x}_{\obs{i}})\) or \(\Pr(\treat{X} \le \treat{x}_{\obs{i}})\).
Probability Density Function (PDF) is the function that determines the realization of a continuous random variable.
The PDF gives the relative likelihood of drawing any specific value, and the exact probability of drawing a value within a given range.
The PDF of a random variable \(\treat{X}\) in one dimension is generally given by
\[\Pr(\treat{X} \in [a,b])=\int^{a}_{b} f(\treat{x})d\treat{x}\]
The random variable \(\treat{X}\) has a probability of falling within the range of the lower bound \(a\) and upper bound \(b\) that is derived from the definite integral of the probability distribution function from \(a\) to \(b\)
{ggplot}
# Plot the PDF of the normal distribution
ggplot(stdnorm_df, aes(x = std_rnorm, y = std_dnorm)) +
# Add a line geom for the PDF
geom_line(size = 2, color = "red") +
# Adjust the paramters of the x axis
scale_x_continuous(breaks = seq(-4, 4, 1), limits = c(-3.5, 3.5)) +
# Adjust the paramters of the y axis
scale_y_continuous(
breaks = seq(0, 0.45, 0.05),
limits = c(0, 0.45),
expand = c(0.005, 0)
) +
# Add labels to the plot
labs(x = "Z-Score", y = "Density")To obtain the probabilities we turn to the Cumulative Distribution Function (CDF) of a continuous random variable.
We encounter the same issue of the exact probability of observing a given value of \(\treat{X}\) being extremely small
Here \(f(\treat{x})\) represents the PDF while \(F(\treat{x})\) represents the CDF
# Plot the CDF of the normal distribution
ggplot(stdnorm_df, aes(x = std_rnorm, y = std_pnorm)) +
#Add a line geom for the CDF
geom_line(size = 2, color = "red") +
#Adjust the paramters of the x axis
scale_x_continuous(breaks = seq(-4, 4, 1), limits = c(-3.5, 3.5)) +
#Adjust the paramters of the y axis
scale_y_continuous(
breaks = seq(0, 1, 0.2),
limits = c(0, 1),
expand = c(0.005, 0)
) +
#Add labels to the plot
labs(x = 'Z-Score', y = 'Pr(X \u2264 x)')In the classic linear model it is assumed the outcome is continuous and unbounded.
When the outcome of interest is nominal, ordinal, or count data regression will fail us in certain ways and to varying degrees.
We’ll spend the next few weeks introducing regression and thinking about when it fails and what to do about it.
The Central Limit Theorem is one of the most foundation concepts in modern statistics and is core to frequency-based conceptions of probability.
The CLT states that as the size of the sampling distribution increases, the distribution of the the sample means will tend towards normality
More precisely, if the sum of independently and identically distributed variables has a mean \(\mean{\mu}\) and a finite variance \(\vari{\sigma}^{2}\), it will approximate a normal distribution
To sustain statistical inference under a frequency-based framework, we effectively hedge all of our bets on the central limit theorem
The sampling distribution of a statistic is the probability distribution of a statistic obtained from repeated sampling.
Suppose our statistic is \(\theta\)
Let’s begin by simulating 100,000 observations to represent our population
The sampling distribution of a statistic is the probability distribution of a statistic obtained from repeated sampling.
Suppose our statistic is \(\theta\)
Let’s begin by simulating 100,000 observations from a Gamma distribution to represent our population
The sampling distribution of a statistic is the probability distribution of a statistic obtained from repeated sampling.
Let’s begin by simulating 100,000 observations from a Gamma distribution to represent our population
# Simulate 100,000 observations from a gamma distribution
gamma_dist <- tibble(r_dist = rgamma(100000, 5, 5))The sampling distribution of a statistic is the probability distribution of a statistic obtained from repeated sampling.
Let’s begin by simulating 100,000 observations from a Gamma distribution to represent our population
# Simulate 100,000 observations from a gamma distribution
gamma_dist <- tibble(r_dist = rgamma(100000, 5, 5))The sampling distribution of a statistic is the probability distribution of a statistic obtained from repeated sampling.
Let’s begin by simulating 100,000 observations from a Gamma distribution to represent our population
# Plot a histogram of the specified gamma distribution
ggplot(gamma_dist, aes(x = r_dist)) +
# Add a histogram geom
geom_histogram(binwidth = 0.10, fill = "aquamarine", color = "black") +
# Adjust the paramters of the x axis
scale_x_continuous(breaks = seq(0, 4, 0.5), limits = c(0, 4)) +
# Adjust the paramters of the y axis
scale_y_continuous(
breaks = seq(0, 10000, 2000),
limits = c(0, 10000),
expand = c(0.006, 0)
) +
# Add labels to the plot
labs(x = "", y = "Frequency")The central limit theorem says that the distribution of “draws” of some statistic from the population will tend to a normal distribution even if the population itself is non-normal
The central limit theorem says that the distribution of “draws” of some statistic from the population will tend to a normal distribution even if the population itself is non-normal
# Create an empty vector with length 500
gamma_dist_500 <- rep(NA, length.out = 500)
# Fill the matrix with the mean from 500 random samples of n = 100
for (i in seq_along(gamma_dist_500)) {
gamma_dist_500[i] <- mean(sample(gamma_dist$r_dist, 100, replace = T))
}
#Check the mean of the sampling distribution
mean(gamma_dist_500)[1] 1.001585
# Plot a histogram of the specified sample distribution
ggplot(as_tibble(gamma_dist_500), aes(x = value)) +
# Add a histogram geom
geom_histogram(binwidth = 0.01, fill = "cyan", color = "black") +
# Adjust the paramters of the x axis
scale_x_continuous(
breaks = seq(0.8, 1.2, 0.05),
limits = c(0.8, 1.2)
) +
# Adjust the paramters of the y axis
scale_y_continuous(
breaks = seq(0, 50, 5),
limits = c(0, 50),
expand = c(0.006, 0)
) +
# Add labels to the plot
labs(x = "\u03B8", y = "Frequency")Our sampling distribution is the distribution of means obtained from repeated sampling
The mean of the sampling distribution is approximately 1.00
The variability around the mean is the standard deviation
Inferentially, the standard error reported to you in a frequentist regression output is “the standard deviation of the sampling distribution.”
The problem of estimation is we generally only have one sample with which to work
The statistic \(\hat{\theta}\) is our estimator
Often, we’re interested in the first moment of the sampling distribution
The first moment of the sampling distribution is its expected value
Formally, an estimator can be defined as
Estimators have no inherent use to us without properties
What are the desirable properties of an estimator?
Small Sample Properties
Unbaisedness
Efficiency
Large Sample Properties
We will pick back up here on Thursday!